84 research outputs found

    Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduce Framework

    Get PDF
    While many existing formal concept analysis algorithms are efficient, they are typically unsuitable for distributed implementation. Taking the MapReduce (MR) framework as our inspiration we introduce a distributed approach for performing formal concept mining. Our method has its novelty in that we use a light-weight MapReduce runtime called Twister which is better suited to iterative algorithms than recent distributed approaches. First, we describe the theoretical foundations underpinning our distributed formal concept analysis approach. Second, we provide a representative exemplar of how a classic centralized algorithm can be implemented in a distributed fashion using our methodology: we modify Ganter's classic algorithm by introducing a family of MR* algorithms, namely MRGanter and MRGanter+ where the prefix denotes the algorithm's lineage. To evaluate the factors that impact distributed algorithm performance, we compare our MR* algorithms with the state-of-the-art. Experiments conducted on real datasets demonstrate that MRGanter+ is efficient, scalable and an appealing algorithm for distributed problems.Comment: 17 pages, ICFCA 201, Formal Concept Analysis 201

    A Partial-Closure Canonicity Test to Increase the Efficiency of CbO-Type Algorithms

    Get PDF
    Computing formal concepts is a fundamental part of Formal Concept Analysis and the design of increasingly efficient algorithms to carry out this task is a continuing strand of FCA research. Most approaches suffer from the repeated computation of the same formal concepts and, initially, algorithms concentrated on efficient searches through already computed results to detect these repeats, until the so-called canonicity test was introduced. The canonicity test meant that it was sufficient to examine the attributes of a computed concept to determine its newness: searching through previously computed concepts was no longer necessary. The employment of this test in Close-by-One type algorithms has proved to be highly effective. The typical CbO approach is to compute a concept and then test its canonicity. This paper describes a more efficient approach, whereby a concept need only be partially computed in order to carry out the test. Only if it passes the test does the computation of the concept need to be completed. This paper presents this ‘partial-closure’ canonicity test in the In-Close algorithm and compares it to a traditional CbO algorithm to demonstrate the increase in efficiency

    Using formal concept analysis to detect and monitor organised crime

    Get PDF
    This paper describes some possible uses of Formal Concept Analysis in the detection and monitoring of Organised Crime. After describing FCA and its mathematical basis, the paper suggests, with some simple examples, ways in which FCA and some of its related disciplines can be applied to this problem domain. In particular, the paper proposes FCA-based approaches for finding multiple instances of an activity associated with Organised Crime, finding dependencies between Organised Crime attributes, and finding new indicators of Organised Crime from the analysis of existing data. The paper concludes by suggesting that these approaches will culminate in the creation and implementation of an Organised Crime ‘threat score card’, as part of an overall environmental scanning system that is being developed by the new European ePOOLICE projec

    A Proposition for Combining Pattern Structures and Relational Concept Analysis

    Get PDF
    International audienceIn this paper we propose an adaptation of the RCA process enabling the relational scaling of pattern structures. In a nutshell, this adaptation allows the scenario where RCA needs to be applied in a relational context family com-posed by pattern structures instead of formal contexts. To achieve this we define the heterogeneous pattern structures as a model to describe objects in a com-bination of spaces, namely the original object description space and the set of relational attributes derived from the RCA scaling process. We frame our ap-proach in the problem of characterizing latent variables (LV) in a latent variable model of documents and terms. LVs are used as compact and improved dataset representations. We approach the problem of LV characterization missing from the original LV-model, through the application of the adapted RCA process using pattern structures. Finally, we discuss the implications of our proposition

    Querying a Bioinformatic Data Sources Registry with Concept Lattices

    Get PDF
    ISSN 0302-9743 (Print) 1611-3349 (Online) ISBN 978-3-540-27783-5International audienceBioinformatic data sources available on the web are multiple and heterogenous. The lack of documentation and the difficulty of interaction with these data banks require users competence in both informatics and biological fields for an optimal use of sources contents that remain rather under exploited. In this paper we present an approach based on formal concept analysis to classify and search relevant bioinformatic data sources for a given user query. It consists in building the concept lattice from the binary relation between bioinformatic data sources and their associated metadata. The concept built from a given user query is then merged into the concept lattice. The result is given by the extraction of the set of sources belonging to the extents of the query concept subsumers in the resulting concept lattice. The sources ranking is given by the concept specificity order in the concept lattice. An improvement of the approach consists in automatic refinement of the query thanks to domain ontologies. Two forms of refinement are possible by generalisation and by specialisation

    Book Reviews

    Get PDF
    This paper presents a program, called In-Close2, that is a high performance realisation of the Close-by-One (CbO) algorithm. The design of In-Close2 is discussed and some new optimisation and data preprocessing techniques are presented. The performance of In-Close2 is favourably compared with another contemporary CbO variant called FCbO. An application of In-Close2 is given, using minimum support to reduce the size and complexity of a large formal context. Based on this application, an analysis of gene expression data is presented. In-Close2 can be downloaded from Sourceforge

    Scalable Estimates of Concept Stability

    Get PDF
    International audienceData mining aims at finding interesting patterns from datasets, where ``interesting'' means reflecting intrinsic dependencies in the domain of interest rather than just in the dataset. Concept stability is a popular relevancy measure in FCA. Experimental results of this paper show that high stability of a concept for a context derived from the general population suggests that concepts with the same intent in other samples drawn from the population have also high stability. A new estimate of stability is introduced and studied. It is experimentally shown that the introduced estimate gives a better approximation than the Monte Carlo approach introduced earlier
    • …
    corecore